An Expectation Maximization Algorithm for Textual Unit Alignment

نویسندگان

  • Radu Ion
  • Alexandru Ceausu
  • Elena Irimia
چکیده

The paper presents an Expectation Maximization (EM) algorithm for automatic generation of parallel and quasi-parallel data from any degree of comparable corpora ranging from parallel to weakly comparable. Specifically, we address the problem of extracting related textual units (documents, paragraphs or sentences) relying on the hypothesis that, in a given corpus, certain pairs of translation equivalents are better indicators of a correct textual unit correspondence than other pairs of translation equivalents. We evaluate our method on mixed types of bilingual comparable corpora in six language pairs, obtaining state of the art accuracy figures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative SPECT and planar 32P bremsstrahlung imaging for dosimetry purpose –An experimental phantom study

Background: In this study, Quantitative 32P bremsstrahlung planar and SPECT imaging and consequent dose assessment were carried out as a comprehensive phantom study to define an appropriate method for accurate Dosimetry in clinical practice. Materials and Methods: CT, planar and SPECT bremsstrahlung images of Jaszczak phantom containing a known activity of 32P were acquired. In addition, Phanto...

متن کامل

A Comparison Of Expectation Maximization and Gibbs Sampling Strategies for Motif Finding

A set of protein or nucleotide sequences may be found to share patterns reflecting biological structure, function and change. The task of identifying these patterns, known as motif finding, can be viewed as an instance of multiple sequence alignment. While it is possible to identify motifs using x-ray and magnetic resonance structures, biologists and computer scientists have developed several a...

متن کامل

An expectation maximization algorithm for training hidden substitution models.

We derive an expectation maximization algorithm for maximum-likelihood training of substitution rate matrices from multiple sequence alignments. The algorithm can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time. We used the algorithm to train hidden substitution matrices on protein alignments in the...

متن کامل

The Key Approach to Translation: Word Alignment Models

This paper focuses on a key aspect of Statistical Machine Translation: word alignment. Various word alignment models are presented, first differentiating between methods and then highlighting the preferred method. A partially detailed mathematical explanation is provided for each model as well as a brief implementation of the Expectation Maximization Algorithm (EM Algorithm) for later models. F...

متن کامل

Semi-Supervised Training for Statistical Word Alignment

We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011